Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make wrapped C++ functions pickleable #30099

Merged
merged 39 commits into from
Feb 20, 2024
Merged

Conversation

rwgk
Copy link
Contributor

@rwgk rwgk commented Feb 8, 2024

Description

For pybind11 users:

This PR resolves pybind/pybind11#1261. For an example of the behavior difference, see the change in tests/test_pickling.py (for easy reference: that test was added with pybind/pybind11#3906).

Notes:

  • The repr() for pybind11 functions is made much more informative and truthful:

Before this PR:

<built-in method simple_callable of PyCapsule object at 0x...>

(PyCapsule objects do not have methods.)

With this PR:

<built-in method simple_callable of pybind11_detail_function_record_v1__gcc_libstdcpp_cxxabi1018 object at 0x...>
  • The increase in generated machine code size is very small (using -DCMAKE_BUILD_TYPE=MinSizeRel):

Before this PR:

------ pybind11_tests.cpython-311-x86_64-linux-gnu.so file size: 5346584

With this PR:

------ pybind11_tests.cpython-311-x86_64-linux-gnu.so file size: 5355256

Factor: 5355256 / 5346584 = 1.0016219702150009

Motivation for this work

PyCLIF-pybind11 has to satisfy many million tests. On the order of 10 tests fail because pybind11 functions are not pickleable. While this number is small, the failures present a major problem:

  1. Apache beam (which uses dill) requires that functions are pickleable: PyCLIF-pybind11 could not be used with beam. Similarly, some of the failing tests involve cloudpickle. It has to be expected that many of the projects depending on cloudpickle will run into issues, too.

  2. The estimated effort to retrofit workarounds in the ~10 failing use cases is "at least one month, possibly two". This is because the use cases tend to be very complex, and because owners need to be convinced that workarounds are meaningful (which they really are not, as this PR proves).

Note also that the effort behind this PR is about one week (slightly less).

Comparison with stock Python and other systems for Python-C++ bindings:




Everything below are very technical details:

Why is

m.def("simple_callable", []() { return 0; });

not pickleable before this PR?

The reason is that the simple_callable is implemented as shown by repr(simple_callable):

<built-in method simple_callable of PyCapsule object at 0x...>

To Python it appears to be a bound function of the type PyCapsule. (The reasons for this choice of implementation are deep and omitted here.)

pickle.dumps(simple_callable) is capable of pickling built-in methods. It does that in two steps:

simple_callable.__reduce_ex__(0)

produces this tuple (see __reduce_ex__ documentation):

(<built-in function getattr>, (<capsule object NULL at 0x...>, 'simple_callable'))

pickle then recurses, attempting to serialize <built-in function getattr> (no problem), <capsule object NULL at 0x...> (problem), and 'simple_callable' (no problem).

The attempt to serialize <capsule object NULL at 0x...> fails with this exception (copy-pasted from pytest output):

>       pickle.dumps(m.simple_callable)
E       TypeError: cannot pickle 'PyCapsule' object

While this is not the desired behavior when pickling simple_callable, there are very good reasons in general that PyCapsule objects are not pickleable. See, for example, here:

warnings.warn('Creating a new PyCapsule %s for a C data structure that may not be present in memory. Segmentation faults or other memory errors are possible.' % (name,), UnpicklingWarning)

The solution implemented in this PR is to

  1. replace the PyCapsule with a custom built-in type, wrapping the pybind11::detail::function_record C++ type the good-old way, with manually written bindings (implemented in include/pybind11/detail/function_record_pyobject.h),

  2. which then makes it possible to provide a __reduce_ex__ implementation to achieve the desired behavior.

The new repr(simple_callable) is:

<built-in method simple_callable of pybind11_detail_function_record_v1__gcc_libstdcpp_cxxabi1018 object at 0x...>

Side note: The Python type name is versioned to ensure ABI compatibility, but to maximize compatibility it is independent of PYBIND11_INTERNALS_VERSION.

simple_callable.__reduce_ex__(0) now produces:

(<built-in function getattr>, (<pybind11_detail_function_record_v1__gcc_libstdcpp_cxxabi1018 object at 0x...>, 'simple_callable'))

When pickle recurses to call __reduce_ex__ of the wrapped function_record object, the result is:

(<built-in function eval>, ("__import__('importlib').import_module('pybind11_tests.pickling')",))

Very simple! — Your author has to admit though that it took quite a while to figure this out. See the development history of this PR for the many dead ends explored before this solution was discovered.

To explain how this works in the pickle.load() step:

  1. eval("__import__('importlib').import_module('pybind11_tests.pickling')") produces a reference to the imported pybind11_tests.pickling module.

  2. getattr(imported_module, 'simple_callable') simply accesses what was just imported.

Note that __import__('pybind11_tests.pickling') only produces the top-level module (pybind11_tests in this case), as explained in the importlib.import_module() documentation.

Suggested changelog entry:

Ralf W. Grosse-Kunstleve added 22 commits February 7, 2024 21:13
… important); PyPy still generates 2 different kinds of errors: test_print failure on macOS with Python 3.8; Python 3.9, 3.10 have no leading `<`
… `PYBIND11_PLATFORM_ABI_ID_V4` to version `function_record_PyTypeObject` `tp_name`
…xes flaky behavior of test_gil_scoped.py). IncludeCleaner fixes.
…_record_PyTypeObject)` also with a simple `static bool first_call`
…t works with all other Python versions). Instead call `function_record_PyTypeObject_PyType_Ready()` from `get_internals()`.
…_detail_function_record_import_helper` (proof of concept).
…nals()` so that it is always called when `get_internals()` is called the first time.
…unction_record_pickle_helper()` call-once initializations triggered from `cpp_function::initialize_generic()`
…per()` call-once initialization triggered from `cpp_function::initialize_generic()`
rwgk pushed a commit to rwgk/stuff that referenced this pull request Feb 13, 2024
Ralf W. Grosse-Kunstleve added 5 commits February 13, 2024 13:06
Much simpler!

(Note that the `function_record_pickle_helper()` code is NOT removed in this commit.)

This approach was discovered in an attempt to solve the problem that
stubgen picks up `_function_record_pickle_helper_v1`.

For example (tensorflow_text/core/pybinds/tflite_registrar.pyi):

```diff
+from typing import Any
+
+def _function_record_pickle_helper_v1(*args, **kwargs) -> Any: ...
```
@rwgk rwgk changed the title WIP pickle_callable_pywrapcc Make wrapped C++ functions pickleable Feb 14, 2024
@rwgk rwgk marked this pull request as ready for review February 14, 2024 19:06
pybind11_fail("UNEXPECTED CALL OF function_record_PyTypeObject_methods::tp_free_impl");
}

inline PyObject *reduce_ex_impl(PyObject *self, PyObject *, PyObject *) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry this implementation violated the protocol: the returned function call should have created an object of the same type as the one that was reduce_ex-d. aka:

reduce = obj.__reduce_ex__()
type(reduce[0](*reduce[1])) == type(obj)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quoting from the Python documentation:

  • A callable object that will be called to create the initial version of the object.
  • A tuple of arguments for the callable object. An empty tuple must be given if the callable does not accept any argument.

There is no mention that the callable has to be of the same type is the object being pickled. Such a restriction would be completely artificial and severely limiting:

There are types that for good reasons are not referenced from any importable module. There is no way that pickle.load() could find them.

This is the case for Boost.Python and pybind11 functions.

Going deeper into the weeds:

Python has __builtins__ as home for some similar types. I was thinking of adding the pybind11 function type to __builtins__, but I was told that future Python versions will make this impossible.

Even deeper:

We could create something like __pybind11__builtins__ (I'd actually call that __pybind11_internals__), which would have the benefit of solving problems like here, BUT, how does that spring into existence? — Python creates __builtins__ on startup. We don't have that luxury.

Also note that there may be multiple pybind11 versions / ABIs in use simultaneously in any given Python interpreter.

Conclusion: I don't think there is any practical way to implement a callable of the "correct" pybind11 function record type that pickle.load() could find.

Copy link

@rainwoodman rainwoodman Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is the type of the return value of the callable, which is supposed to be the "initial version of the object". From what I read the current approach is:

  1. function_recorder's reduce_ex callable returns a module object, even though there is a PyObject for function_record itself.
  2. Another Python side object's "callable" looks up the attr from the module object to return the callable. (I could not quite find the relevant implementation however).

I argue that step 1's callable should have returned the PyObject for function_record instead, because of the "returns initial value" requirement. As function_record has a name, and a scope, I think we should have enough information for that. The records can be either attached to a pybind11_builtin module, or the actual scope where the CCallable is attached.

Then step 2 can be adjusted accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Initial" here refers to what happens when the object is unpickled.

When the object is unpickled: the callable creates an object, then calls __getstate__ (if there is one) to mutate the "initial" object.


The step 2. is something we have no control over.

Because it is coded into the pickle implementation how it handles built-in method objects repr(m.simple_callable):

<built-in method simple_callable of pybind11_detail_function_record_v1__gcc_libstdcpp_cxxabi1018 object at 0x...>

Unless we want to completely change how pybind11 functions are implemented (nanobind actually did that!), we are forced to design the __reduce_ex__ of the function_record to produce:

A callable that returns an (intermediate) object for with getattr(obj, 'simple_callable)` returns our function object.

The intermediate object could be anything that has that behavior.

Additional important requirement: it needs to assume that the module with the function was not imported already, otherwise the pickle load step (we have to assume it's in a new process) will fail.

In other words, in or before the getattr(obj, 'simple_callable') step the module needs to be imported, and we cannot rely on any side-effects of importing the module before we trigger the import:

Yes, the function object has a name and a scope, but the type object does not have a scope.

This will be a problem even for nanobind, which doesn't use PyCFunction but has it's own nanobind.nb_func type. To stay with that concrete example: unless nanobind exists as an importable module, the pickle load stage cannot work. This is what I meant by the "spring into existence" problem mentioned before.

I was thinking of giving giving the function_record type an __init__ that could act as the callable in the pickle mechanism, but that's the exact same problem: how does it spring into existence in the pickle load stage?

Note that this PR solves that problem in a very minimalistic and clean fashion:

(<built-in function eval>, ("__import__('importlib').import_module('pybind11_tests.pickling')",))

It could be even better (avoid the eval) if we had something like __import_module__, which could then act as the callable directly:

(<built-in function __import_module__>, ("pybind11_tests.pickling",))

Unfortunately that does not exist. I actually tried under this PR to make such a callable myself, and to inject it into the rec->scope when a cpp_function is built (so that it springs into existence just in time), but that fell flat because stubgen picks it up (>500 TGP failures because we have ~130 auto-generated but checked-in stubgen files). See message of the commit that replaced the "pickle helper" callable with eval:

0e7bb10

I had _function_record_pickle_helper_abi_stamp as the name at that time. I also tried __function_record_pickle_helper_abi_stamp and __function_record_pickle_helper_abi_stamp__ but stubgen stubbornly picked it up no matter what. The eval approach was born out of an intense search for a solution that does not make it necessary to change stubgen, or worse, 130+ checked-in auto-generated files all over the codebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @rainwoodman for taking the time to go through this yesterday 1:1, and for correcting my understanding of your concern.

I captured the experimental code we worked on in 02b31b1 (exactly as it was when we left off).

Our conclusion is captured in the long comment added with bcdfc69. Could you please take another look?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the comment and the test. This looks good!

@rwgk
Copy link
Contributor Author

rwgk commented Feb 15, 2024

Logging result of simple experiment: nanobind functions are not pickleable

==================== Test output for nanobind/tests:test_functions:
============================= test session starts ==============================
platform linux -- Python 3.11.7, pytest-7.3.2, pluggy-0.9.0
collected 46 items

test_functions.py F....................s.......F................

=================================== FAILURES ===================================
_________________________________ test_pickle __________________________________

    def test_pickle():
>       pickle.dumps(t.test_01, protocol=-1)
E       TypeError: cannot pickle nanobind.nb_func objects

test_functions.py:9: TypeError
_______________________________ test29_traceback _______________________________

    def test29_traceback():
        result = t.test_30(fail_fn)
        regexp = r'Traceback \(most recent call last\):\n.*\n  File "[^"]*", line 8, in fail_fn\n.*RuntimeError: Foo'
        matches = re.findall(regexp, result, re.MULTILINE | re.DOTALL)
>       assert len(matches) == 1
E       assert 0 == 1
E        +  where 0 = len([])

test_functions.py:266: AssertionError
=========================== short test summary info ============================
FAILED test_functions.py::test_pickle - TypeError: cannot pickle nanobind.nb_...
FAILED test_functions.py::test29_traceback - assert 0 == 1
=================== 2 failed, 43 passed, 1 skipped in 0.45s ====================
-- 2024-02-15 10:24:05 PST Forge runner: Test failed with exit code 1 while running on jfbjd13.prod.google.com
================================================================================
--- a/nanobind/tests/test_functions.py
+++ b/nanobind/tests/test_functions.py
@@ -1,9 +1,14 @@
 import test_functions_ext as t
+import pickle
 import pytest
 import sys
 import re


+def test_pickle():
+    pickle.dumps(t.test_01, protocol=-1)
+
+
 def fail_fn(): # used in test_30
     raise RuntimeError("Foo")

@rwgk
Copy link
Contributor Author

rwgk commented Feb 15, 2024

Logging result of simple experiment: SWIG functions are pickleable

[ RUN      ] PywrapTaskTest.testPickleSWIGFunction
repr(swig_function)='<built-in function MakeStatus>'
[       OK ] PywrapTaskTest.testPickleSWIGFunction
+  def testPickleSWIGFunction(self):
+    swig_function = pywrap_task.MakeStatus
+    print(f"{repr(swig_function)=}", flush=True)
+    serialized = pickle.dumps(swig_function)
+    deserialized = pickle.loads(serialized)
+

@rwgk
Copy link
Contributor Author

rwgk commented Feb 15, 2024

Cross-reference to simple experiment: Boost.Python functions are not pickleable

@rwgk
Copy link
Contributor Author

rwgk commented Feb 15, 2024

Logging result of simple experiment: Python functions (built-in & native) are pickleable (unless they are nested)

$ python3 python_functions_pickle_test.py
3.11.6 (main, Oct  8 2023, 05:06:43) [GCC 13.2.0]

repr(eval)='<built-in function eval>'
eval is pickleable

repr(signal._signal.getsignal)='<built-in function getsignal>'
signal._signal.getsignal is pickleable

repr(native_function)='<function native_function at 0x7f19db8a7d80>'
native_function is pickleable

repr(nested_native_function)='<function test_nested_native_function.<locals>.nested_native_function at 0x7f19db795f80>'
nested_native_function is NOT pickleable: Can't pickle local object 'test_nested_native_function.<locals>.nested_native_function'
$ cat python_functions_pickle_test.py
import sys
import pickle

print(sys.version)
print()

def test_built_in_living_in_builtins():
    print(f"{repr(eval)=}")
    serialized = pickle.dumps(eval)
    deserialized = pickle.loads(serialized)
    assert deserialized is eval
    print("eval is pickleable")
    print()

def test_built_in_living_in_extension():
    import signal
    print(f"{repr(signal._signal.getsignal)=}")
    serialized = pickle.dumps(signal._signal.getsignal)
    deserialized = pickle.loads(serialized)
    assert deserialized is signal._signal.getsignal
    print("signal._signal.getsignal is pickleable")
    print()

def native_function():
    pass

def test_native_function():
    print(f"{repr(native_function)=}")
    serialized = pickle.dumps(native_function)
    deserialized = pickle.loads(serialized)
    assert deserialized is native_function
    print("native_function is pickleable")
    print()

def test_nested_native_function():
    def nested_native_function():
        pass
    print(f"{repr(nested_native_function)=}")
    try:
        pickle.dumps(nested_native_function)
    except Exception as e:
        print("nested_native_function is NOT pickleable:", e, "\n")
    else:
        raise RuntimeError("Exception expected")
    print()

test_built_in_living_in_builtins()
test_built_in_living_in_extension()
test_native_function()
test_nested_native_function()

assert deserialized() == 20220426
assert deserialized is m.simple_callable

# UNUSUAL: A pickle roundtrip starting with `m.simple_callable.__self__` yields `m`:
Copy link

@rainwoodman rainwoodman Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's hit a few keywords to hopefully make it more likely to land here from a google search:

# UNUSUAL: function record pickling roundtrip returns a module, not the function record object:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 1c24995

Thanks!

@rwgk rwgk merged commit de89591 into google:main Feb 20, 2024
59 checks passed
@rwgk rwgk deleted the pickle_callable_pywrapcc branch February 20, 2024 17:22
@rwgk rwgk added the could be backported Could be backported to pybind11 master label Feb 20, 2024
rwgk pushed a commit to rwgk/pybind11clif that referenced this pull request Feb 20, 2024
rwgk pushed a commit to google/clif that referenced this pull request Feb 23, 2024
…changes).

These test were added while working on google/pybind11clif#30099.

Before google/pybind11clif#30099, the `SimpleCallable` pickle tests were failing with PyCLIF-pybind11.

PiperOrigin-RevId: 608694227
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
could be backported Could be backported to pybind11 master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Making a C++ function pickleable?
2 participants